Hi, I’m Julien Gamba, a PhD student at the IMDEA Networks Institute in Madrid (Spain). I am officially kicking off my project as a Digital Lab Fellow, working on making sense of the Android supply chain.
The Android Open Source Project (AOSP) was first released by Google in 2008, and has since become the most used operating system with more than 2.5 billions users worldwide as of 2019. Android is open source, so any device manufacturer can modify and adapt it to their specific needs, or add proprietary features before installing it on their devices. The purpose of doing so for Android vendors is mainly to add value to their products and distinguish themselves from the competition.
This has created a vast supply chain that is completely opaque to users which includes manufacturers, resellers, chipset manufacturers, network operators, and prominent actors of the online industry partnered with vendors. Each one of these stakeholders can pre-install extra applications, or implement proprietary features at the framework level. Pre-installing an application on the system can be very valuable, as these applications are privileged by the system and can therefore access system APIs or personal data more easily than applications installed in user-space. However, such customizations can create privacy and security threats. From a user perspective, this raises a certain number of questions: if I buy a new phone, who got access to it before me? Will my data stay private, or will it be shared with whoever has a partnership with my phone vendor? How can I know who has installed apps on my device, and how can I assert my digital rights?
My goal is to shed light on the Android supply chain and explore the attribution, privacy and security aspects of this ecosystem. This starts by improving the existing tools to make them able to work on pre-installed apps. At the time of this writing, there is no technical solution specifically designed to analyze the code of pre-installed applications, to run them in an emulated environment and inspect their network activity, or even to attribute a given pre-installed app to the company that developed it. Pre-installed apps differ from regular apps that you know from the Google Play store, in that they can rely on different, more obscure features of the Android operating system (for instance, custom permissions or shared user ID) that might be difficult to use for user-installed applications. This makes the current tools that are available to researchers difficult to use, at best.
Once these tools are ready, the second objective will be to design reliable ways to identify developers. We can do that by extracting multiple signals from the applications (for instance, information from the certificates that sign the apps, brand of the device on which the app was pre-installed, and so on). By correlating these signals on a large variety of apps and devices brands, it will be possible to identify the developer of a given app with great certainty and improve the traceability of the software. This is a crucial step: what good is it to know that an app uses your private data if you cannot do anything about it? By knowing who is behind this data collection, users can take action and protect their privacy.
Overall, my goal is simply to bring more attention to the issues surrounding the Android supply chain. This ecosystem is completely hidden to users, and ends up creating an environment detrimental to their privacy. My hope is that with more transparency, the Android ecosystem will in turn become safer for users.